Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry connections to PostgreSQL before starting martin #1545

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

teyotan
Copy link

@teyotan teyotan commented Oct 21, 2024

Trying to solve (#1539)

There are still a lot of improvements that can be done, such as making the retry params configurable and adding test for unreachable db instance. But would love to get feedbacks first before I proceed on those.

@nyurik
Copy link
Member

nyurik commented Oct 21, 2024

Thanks, looks good! We should discuss the configuration for this too as part of this PR. Jitter is a good idea, thx.

At some point we should also test/evaluate what happens when PG restarts (drops connections, switches to another primary, etc) while Martin is running - in theory PG pool should simply reconnect, but I have never tested that. Probably good for another PR.

let pg = PgBuilder::new(self, id_resolver).await?;
// Retry strategy: Fixed 5 seconds interval backoff with jitter (random variation)
let retry_strategy = FixedInterval::from_millis(5000)
.map(jitter) // Add random jitter to avoid "thundering herd" problem
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be a bit unexpected for users:

This strategy might not wait at all.
=> all 3 request may be either a total of 15s or 0s.

https://docs.rs/tokio-retry/latest/src/tokio_retry/strategy/jitter.rs.html#3-5

The amount to which jitter affects the FixedInterval should likely not be 0%-100%

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thx, I agree - we should probably do something like +/- 20% - does jitter support that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just have our own function which does what we need..
The one-liner is not that complicated 😁

Also we might want to lower the retry time and increase the retry amount. As far as I remember it (might be misremembering) Postgres takes a hot few seconds (somewhere between 10 and 20) to start up

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply, suddenly got a lot on my plate. I'll continue working on this around Tuesday or Wednesday.

There are multiple parameters that we can configure:

  1. Retry strategy. Tokio_retry has 3 built in strategy, do we want it to be configurable or do we just choose one? If we do choose one, exponential or fibonacci back off might be a better choice than fixed interval.
  2. the interval (and factor if we use exponential/fibonacci). Choosing the default value for this one is a bit tricky. In my limited experience, CMIIW, cold starting a postgresql server does not take very long, <1s. It's another story if you also start a server or a container. I think a good option right now is to have retry amount of 5 and the sum of wait duration across these 5 retry is 60 seconds.
  3. Jitter. Yeah I didn't realize jitter just multiply the duration by 0-100%, my bad (first time working with rust, im guilty of skipping some part that I don't understand lol). We can do jitter by 80%-120%. Do we want jitter to be configurable (enable/disable and jitter range)?

@CommanderStorm CommanderStorm changed the title Add retry mechanism to connect to PostgreSQL before starting maplibre… retry connections to PostgreSQL before starting martin Oct 27, 2024
@CommanderStorm CommanderStorm changed the title retry connections to PostgreSQL before starting martin Retry connections to PostgreSQL before starting martin Oct 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants